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Both genetic and environmental factors contribute to the aetiology of multiple sclerosis (MS). More than 50 
genomic regions have been associated with MS susceptibility and vitamin D status also influences the risk of 
this complex disease. However, how these factors interact in disease causation is unclear. We aimed to 
investigate the relationship between vitamin D receptor (VDR) binding in lymphoblastoid cell lines (LCLs), 
chromatin states in LCLs and MS-associated genomic regions. Using the Genomic Hyperbrowser, we 
found that VDR-binding regions overlapped with active regulatory regions [active promoter (AP) and 
strong enhancer (SE)] in LCLs more than expected by chance [45.3-fold enrichment for SE (P<2.0e-05) 
and 63.41-fold enrichment for AP (P< 2.0e-05)]. Approximately 77% of VDR regions were covered by 
either AP or SE elements. The overlap between VDR binding and regulatory elements was significantly 
greater in LCLs than in non-immune cells (P<2.0e-05). VDR binding also occurred within MS regions 
more than expected by chance (3.7-fold enrichment, P<2.0e-05). Furthermore, regions of joint overlap 
SE-VDR and AP-VDR were even more enriched within MS regions and near to several disease-associated 
genes. These findings provide relevant insights into how vitamin D influences the immune system and the 
risk of MS through VDR interactions with the chromatin state inside MS regions. Furthermore, the data pro- 
vide additional evidence for an important role played by B cells in MS. Further analyses in other immune cell 
types and functional studies are warranted to fully elucidate the role of vitamin D in the immune system. 



INTRODUCTION 

It is well known that genetic factors influence susceptibility to 
the complex disease multiple sclerosis (MS) (1). Large 
population-based studies investigating the familial aggregation 
of the disease have shown that the risk of MS is increased 
among biological relatives of patients and positively correlates 



with the degree of kinship (2,3). These findings provided the 
rationale for linkage and association studies which identified 
the main genetic locus for MS within the major histocompati- 
bility complex (MHC) class II region (4-7). However, genetic 
risk factors for MS are not limited to the MHC and the devel- 
opment of genome-wide association study (GWAS) 
approaches has provided further insights into the genetic 
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architecture of the disease (8,9). Recently, The International 
Multiple Sclerosis Genetics Consortium and the Wellcome 
Trust Case Control Consortium 2 performed a GWAS includ- 
ing more than 9000 MS patients and found evidence for over 
50 genomic regions associated to MS (10). The discovery of 
associated genes can greatly aid our knowledge of disease aeti- 
ology and pathogenesis. 

The chromatin landscape of a cell is specific to each cell 
type and, among other roles, determines which regions of 
the genome are accessible to the binding of transcription 
factors and whether transcription occurs or is repressed. A 
recent study mapped a number of chromatin marks across 
nine cell types to systematically characterize regulatory ele- 
ments and their cell-type specificities (11). We have recently 
shown that MS-associated genomic regions overlap with 
strong enhancer (SE), active promoter (AP) and strongly tran- 
scribed regions in lymphoblastoid cell lines (LCLs), more than 
expected by chance and more so than in non-immune cell 
types (12). This is relevant since genetic variants associated 
with a certain disease are likely located within genomic 
regions which are active in the causative cell type(s) and 
thus these results further highlighted the primary immuno- 
logical nature of MS (13). 

Like all complex traits, genes are not the only determinant 
of MS susceptibility. Several lines of evidence support a 
role for vitamin D deficiency in influencing the risk of MS. 
Both vitamin D intake and vitamin D levels are inversely asso- 
ciated with the risk of MS later in life (14,15). Furthermore, 
rare loss-of-function variants in the CYP27B1 gene (which 
encodes the enzyme which activates vitamin D) increase the 
MS risk (16). How vitamin D is involved in MS aetiology is 
unclear, but recent findings strongly support the presence of a 
gene -environment interaction for vitamin D in MS. Vitamin D 
acts in the genome via its cognate nuclear receptor, the vitamin 
D receptor (VDR). Using chromatin immunoprecipitation fol- 
lowed by massively parallel DNA sequencing (ChlP-seq) in 
LCLs, our group has shown the presence of 2776 different 
vitamin D responsive elements throughout the genome bound 
by the VDR. Notably, genetic loci associated with MS were sub- 
stantially enriched for VDR-binding sites (17). 

However, the VDR-MS enrichment analysis was performed 
using old data on MS-associated genomic regions and needs to 
be updated in light of the latest GWAS findings (10). Further- 
more, the relation between VDR binding and chromatin states 
has never been explored. In the present study, our aim was to 
investigate how genomic regions with VDR binding in LCLs 
(VDR regions) (17), chromatin states in LCLs (11) and 
MS-associated genomic regions (MS regions) (10) are related 
to each other. We performed this analysis using the Genomic 
Hyperbrowser (http://hyperbrowser.uio.no/hb/), a powerful 
and robust bioinformatic tool which is able to investigate 
genomic relations genome wide as well as at local scales (18). 



RESULTS 

Global overlap between VDR regions and chromatin states 
in LCLs 

Our first aim was to assess in which chromatin states in LCLs 
VDR binding after vitamin D stimulation was more likely to 



occur. Chromatin states analysed were AP, weak promoter 
(WP), poised promoter (PP), SE, weak enhancer (WE), poly- 
comb repressed (PR), heterochromatic (H), insulator (I), 
strongly transcribed (ST), weakly transcribed (WT) and repeti- 
tive/CNV (Rep/CNV) (11). 

The position of VDR regions was fixed, while chromatin 
intervals were randomized as described in the Materials and 
Methods. On the global scale, the overlap between VDR 
regions and chromatin states was higher than expected by 
chance in 5 out of the 11 chromatin states analysed (SE, 
WE, AP, WP and I). However, the overlap appeared particu- 
larly strong in chromatin states of high regulatory activity 
(SE and AP). Approximately 44 and 33% of VDR regions 
were in fact covered by SE and AP, respectively. The enrich- 
ment values of VDR regions within these chromatin states 
were 45.26 for SE (P < 2.0e-05) and 63.41 for AP (P < 
2.0e — 05). Furthermore, SE and AP were the only chromatin 
states in which the overlap was significantly higher than 
expected in all chromosome arms (41 out of 41) (Table 1). 

Similarly, when VDR-binding regions before vitamin D 
stimulation were used for the analysis, both SE and AP ele- 
ments overlapped with VDR more than expected by chance 
(P<2.0e-05 for both). However, the extent of SE-VDR 
and AP-VDR overlaps was decreased and increased, respect- 
ively (SE-VDR: enrichment = 15.5, proportion of VDR 
covered = 21%; AP-VDR: enrichment = 1 19.6, proportion 
of VDR covered = 48%). Vitamin D stimulation substantially 
increases the number of VDR-binding sites across the genome 
(17). Furthermore, studies have shown that other nuclear 
receptors, such as the glucocorticoid receptor, bind primarily 
to already accessible chromatin regions after stimulation 
(19). For these reasons, we decided to use VDR-binding 
data after vitamin D stimulation for all the following analyses. 

Overlap between VDR regions and chromatin states 
of LCLs versus control cell lines 

On its own, the presence of significant overlap between VDR 
regions and certain chromatin states in LCLs does not neces- 
sarily imply biological significance. Both VDR regions and 
chromatin intervals could just be more likely to be near com- 
monly transcribed genes and a similar degree of overlap could 
be present between VDR regions and chromatin states of other 
cell types. To exclude this hypothesis, we tested whether the 
previously identified LCL chromatin states with significant 
overlap (SE, WE, AP, WP and I) co-localized with VDR 
regions more than the same chromatin states from three add- 
itional cell types (hepatocytes, fibroblasts and keratinocytes). 
We found that LCL-specific SE and AP elements overlapped 
with VDR regions more than SE and AP elements from 
control cell types (P < 2.0e — 05 for all comparisons, Table 2). 

Overlap at the cytoband level: shared preference of bins 
versus detailed base pair overlap 

By simply looking at the proportion of the genome which is 
covered individually by two sets of genomic intervals 
(tracks), one can calculate the expected base pair overlap 
(e.g. if two tracks cover 10% of the genome each, the expected 
overlap would be 1%). However, this assumption is valid only 
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if the two tracks are independent from each other, which is not 
always the case. For example, tracks could be preferentially 
located within the same regions of the genome and this 
could considerably increase their expected base pair overlap. 
This appears to be the case for both VDR-SE and VDR-AP 
relationships as suggested by the strong correlations observed 
between the proportional coverage of SE or AP versus VDR 
regions within each cytoband (Fig. 1). 

In order to understand whether the strong overlap observed 
between VDR regions and SE and AP is not only due to shared 
preference for the same genomic regions but also due to a ten- 
dency towards detailed base pair overlap, we calculated and 
compared the following values: (i) the expected VDR-SE 
and VDR-AP base pair overlaps if all these elements were 
independent (independency of tracks); (ii) the proportional 
coverage of VDR, SE and AP within each cytoband; (iii) the 
expected VDR-SE and VDR-AP base pair overlaps after 
fixing the proportional coverage of VDR, SE and AP elements 
per cytoband (so that each of them is required to occur with 
the same base pair coverage within each cytoband as 
observed); (iv) the actual VDR-SE and VDR-AP observed 
base pair overlaps. All values are shown in Table 3. 

When compared with independency of tracks, fixing cyto- 
band preferences increased the expected overlap by ~ 2-fold 
(from 0.0011 to 0.0023% for VDR-SE and from 0.0005 to 
0.0011% for VDR-AP). This further indicates the strong bias 
of VDR, SE and AP elements towards occurring within the 
same regions of the genome. However, the actual observed 
VDR-SE and VDR-AP overlaps were a further 13 and 21 
times greater than the expected overlap after fixing cytoband 
preferences (P < 0.001, for both VDR-SE and VDR-AP). 
This shows that both the VDR-SE and VDR-AP overlaps are 
not only due to shared preference for the same cytobands 
but also due to a more detailed overlap at the base pair 
level. In addition, we found that regions of overlap between 
VDR and either SE or AP were strikingly enriched for differ- 
entially expressed genes after vitamin D stimulation 
(enrichment = 8.99, P < 2.0e-05). 



Global overlap between VDR regions and MS regions 

We next investigated whether VDR regions were more likely 
than chance to occur within MS-associated regions. We have 
previously observed this using 3-year-old data on 
MS-associated regions (17). In the updated analysis, out of 
58 MS-associated regions, 35 (60.3%) had VDR binding 
inside (Table 4). The positions of MS regions were kept 
fixed, while VDR intervals were randomized as described in 
the Materials and Methods. The observed base pair overlap 
was considerably higher than expected by chance (P < 
2.0e — 05), with a fold enrichment of 3.67. 

However, both VDR-binding sites and MS regions could 
have a tendency to be located near genes. This shared relation 
could itself lead to a higher overlap between VDR and MS 
regions than expected by independency of these tracks. As 
expected, when position relative to genes was taken into 
account in the null model, the expected number of VDR 
regions falling inside MS regions increased from 30 to 45 
(see Materials and Methods for details). However, the 
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Table 2. Comparison of overlap between VDR regions in LCLs and chromatin states in LCLs versus control cell types 
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observed number of VDR regions falling inside MS regions 
was 2.2-fold higher than expected (99 vs. 45) (P < 0.001). 

VDR-AP and VDR-SE overlaps within MS regions 

Given the significant co-occurrence of VDR-SE and VDR-AP 
at the genome-wide level and the observed overlap between 
VDR and MS regions, we investigated the extent of the 
overlap between VDR and SE and between VDR and AP 
within MS regions. Similar to how we analysed the genome- 
wide VDR-SE and VDR-AP overlaps, we calculated and com- 
pared the following values: (i) the expected VDR-SE and 
VDR-AP base pair overlaps within MS regions if all these ele- 
ments were independent; (ii) the proportional coverage of 
VDR, SE and AP within each MS region; (iii) the expected 
VDR-SE and VDR-AP base pair overlaps after fixing the pro- 
portional coverage of VDR, SE and AP elements per MS 
region; (iv) the actual VDR-SE and VDR-AP observed base 
pair overlaps within MS regions. All values are shown in 
Table 5. 

When compared with independency of tracks, fixing propor- 
tional coverage of MS regions slightly increased the expected 
overlap (from 0.0133 to 0.0204% for VDR-SE and from 
0.0055 to 0.0079% for VDR-AP). The actual observed 
VDR-SE and VDR-AP overlaps were a further 4.75 and 
14.93 times higher than the expected overlap after fixing for 
MS regions (P < 0.001, for both VDR-SE and VDR-AP). 
Overlap between VDR and either SE or AP elements was 
present in several MS-associated regions and near a number 
of putative candidate genes, including SP140, CD86, 
IL22RA2, CXCR5, TNFRSF1A, ZFP36L1, BATF, IRF8, 
CD40 and TNFRSF6B in addition to a region on chromosome 
6 with no candidate genes (Figs 2 and 3). 

VDR-SE-MS and VDR-AP-MS: a three-way analysis 

We have seen how VDR regions tend to overlap with MS 
regions, SE and AP elements. Furthermore, MS regions also 
overlap with SE and AP intervals more than expected by 
chance (12). A genome-wide plot showing their occurrence 
and relations built using the software Circos is presented in 
Figure 4 (20). It appears therefore more appropriate to look 
at the relations between these genomic elements using a three- 
way rather than two-way (pair-wise) approach. We can do this 
in terms of probability calculus by considering the proportion 
of the genome covered by a certain track A as a probability 
'P(A)'. The observed overlap between two tracks A and B 
can therefore be written as 'P(A&B)=P(A)*P(B | A)' (i.e. 




00 OS .10 IS .20 .00 .01 .02 .03 .04 .05 .06 

SE proportional coverage AP proportional coverage 



Figure 1. Correlation between VDR and SE/AP proportional coverage per 
cytoband (SE-VDR: Pearson correlation coefficient = 0.53, P<le — 10; 
AP-VDR: Pearson correlation coefficient = 0.61, P < le— 10). 



Table 3. Overlap between VDR and SE and between VDR and AP regions at 
the cytoband level 



SE-VDR AP-VDR 



Expected overlap given coverage of the whole 0.001 1% 0.0005% 
genome 

Expected overlap after fixing individual coverage of 0.0023% 0.001 1% 
cytobands 

Observed overlap 0.0307% 0.0231% 

Observed/expected given coverage of whole genome 27.9 46.2 

Observed/expected after fixing coverage of 13.35 21 
cytobands 



the proportion of the genome covered by both A and B is 
equal to the proportion covered by A, multiplied by the pro- 
portion of A covered by B). Extending this to three tracks, 
the observed overlap between A, B and C can be written as 
'P(A&B&C) = P(A)*P(B | A)*P(C | B,A)'. 

As for two-way relations, the expected overlap between 
three tracks A, B and C given independence would simply 
be the product of the coverage proportions of each track 
'P(A)*P(B)*P(C)\ One can then compare the observed 
'P(A&B&C)' against the expected 'P(A)*P(B)*P(C)' value 
under independency of tracks. It is also possible to keep 
some partial dependencies, e.g. fixing the observed overlap 
between A and B throughout the genome to get another 
expected overlap value which takes into account the overlap 
between A and B 'P(A&B)*P(C) = P(C)*P(A)*P(B | A)'. By 
comparing the observed overlap 'P(A&B&C)' against the 
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Table 4. MS-associated regions, candidate genes and presence of VDR binding inside 
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expected 'P(A&B)*P(C)\ one can see if there is any overlap 
preference between the three tracks beyond what is expected 
from the pair-wise overlap between A and B and the propor- 
tion of the genome covered by C. 

We performed these calculations considering SE-VDR-MS 
as well as AP-VDR-MS relations and the results are shown 
in Table 6. The observed overlaps 'P(SE&VDR&MS)' and 
'P(AP&VDR&MS)' were considerably higher than any of 
the expected overlap values both when tracks were considered 
independent and when pair-wise overlaps were taken into 
account. Similar results were also obtained when fixing the 
cytoband preference for the corresponding values of individual 
or pair-wise overlap coverage. 



VDR, SE and AP position relative to genes 
and MS regions 

On average, VDR regions were more likely to be located in the 
upstream region with a particularly strong preference for tran- 
scription start sites (TSS). The lowest VDR coverage was 
observed within genes. As expected, AP elements were also 
preferentially located next to TSS. Accordingly, AP-VDR 
overlaps tended to be located next to TSS. SE regions were 
located both inside and outside genes, with particularly high 
peaks before and after TSS and depleted in TSS itself 
(Fig. 5). The overlap between SE and VDR appeared particu- 
larly frequent in downstream regions. 

We also looked at how VDR, SE and AP elements tended to 
be distributed within MS regions (Fig. 6). Interestingly, VDR 
coverage appeared to be particularly low in the centre of MS 



Table 5. Overlap between VDR and SE and between VDR and AP regions 
within MS regions 

SE-VDR AP-VDR 

Expected overlap given coverage of all MS regions 0.0133% 0.0055% 

Expected overlap after fixing individual coverage of 0.0204% 0.0079% 
MS regions 

Observed overlap 0.0970% 0.1180% 

Observed/expected given coverage of all MS regions 7.29 21.45 

Observed/expected after fixing coverage of MS 4.75 14.93 
regions 



regions. SE elements were clearly more likely to be located at 
the centre of MS regions, while the position of AP elements as 
well as that of AP-VDR and SE-VDR overlaps did not follow 
any particular distribution. 



DISCUSSION 

Assessing whether two or three sets of genomic intervals 
overlap may appear like a simple question to answer. 
However, performing the analysis without bias is more com- 
plicated. For example, one needs to distinguish between 
whether the overlap is due to shared preference for the same 
genomic regions or due to detailed overlap at the base pair 
level. Furthermore, two different tracks could share a similar 
tendency to be located in proximity to a third set of 
genomic intervals (for example genes), and when this 
happens the effect of these confounding tracks needs to be 
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Figure 2. Overlap between VDR regions and SE/AP elements within MS-associated genomic regions (I). 



taken into account. We were able to overcome these issues 
using the Genomic Hyperbrowser (18). 

We recently showed that MS regions overlapped with SE 
and AP elements of LCLs more than expected by chance 
and more than in non-immune cell types (12). We followed 
up these findings and investigated how VDR binding is 
involved in the relation between MS regions and chromatin 
states. We found that: 

(i) Following vitamin D stimulation, VDR-binding regions 
in LCLs overlap with AP and SE intervals in LCLs 
more than expected by chance and more than in non- 
immune cell types. The extent of this overlap is astonish- 
ing with more than 40 and 30% of VDR regions covered 
by SE and AP, respectively. Interestingly, when VDR 
binding in unstimulated cells is considered, the extent 
of SE-VDR and AP-VDR overlaps is, respectively, 
lower and higher than in stimulated conditions. This sug- 
gests that the binding of VDR to enhancer regions is par- 
ticularly augmented after vitamin D stimulation, while 
the presence of VDR in promoter elements is more 
stable. 



(ii) The AP-VDR and SE-VDR overlaps are present both at 
the global (genome wide) and local (cytoband) scales. 
Notably, we were able to show that these elements are 
not only preferentially located within the same genomic 
regions, but also tend to overlap at base pair level, sug- 
gesting that the VDR directly interacts with enhancer 
and promoter elements and likely modifies their action. 
This is further supported by the observation that genes 
with differential expression after vitamin D stimulation 
are preferentially located near sites of SE-VDR and 
AP-VDR overlap. 

(iii) VDR binding is also more likely to occur within MS 
regions when compared with the rest of the genome and 
more than 60% of MS regions are bound by the VDR. 
These include the genomic regions containing the 
candidate genes EVI5, VCAM1, CXCR4, SP140, TME 
M39A, CD86, NFKB1, CXCR5, TNFRSF1A, CLECL1, 
CYP27B1, CLEC16A, IRF8, STATS, TNFSF14, TYK2, 
CD40 and TNFRSF6B . This overlap is also significantly 
higher than expected when genes are considered a con- 
founding track. 
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Figure 3. Overlap between VDR regions and AP/SE elements within MS-associated genomic regions (II). 



(iv) The SE-VDR and AP-VDR overlaps are also present with 
the same characteristics within MS regions, suggesting 
that these relations are relevant for disease aetiology. 
Candidate genes with either SE-VDR or AP-VDR 
overlap nearby included CD86, CXCR5, TNFRSF1A, 
RGS1, SP140, TMEM39A, IL22RA2, BATF, CLEC16A, 
TNFRSF6B, IRF8 and CD40. Interestingly, one of the 
MS-associated regions located on chromosome 6, which 
was void of candidate genes, was characterized by two 
sites of overlap of SE-VDR and AP-VDR, perhaps sug- 
gesting a long-range or frans-regulatory role. 

(v) Based on our three-way analysis, the SE-VDR-MS and 
AP-VDR-MS overlaps are substantially higher than the 
expected values both when tracks are considered to be 
independent and when pair-wise relations are taken into 
account. 

(vi) VDR binding occurs with a low frequency in the centre 
of MS regions. In contrast, SE elements are clearly 
over-represented in the centre of the MS-associated 
regions, while AP intervals do not follow any particular 
distribution. No striking preference for any position 



within MS regions was shown for both AP-VDR and 
SE-VDR overlaps. 

Since chromatin data from vitamin D-stimulated cells were 
not available, the correct interpretation of our results is that 
VDR binds to genomic regions that have a regulatory function. 
This does not imply a causative relation between VDR binding 
and the chromatin profile of a cell. Future studies should test 
this hypothesis comparing VDR and chromatin data before 
and after vitamin D stimulation. However, taken together, 
these findings have a number of important implications. 
First, they suggest that VDR binding influences or interacts 
with the chomatin profile of a cell. The regulatory role 
played by vitamin D at the genomic level likely underlies its 
modulatory actions in the immune system. Both the VDR 
and the CYP27B1 enzyme are expressed in an exceptionally 
wide range of immune cells, including B cells, but also Thl 
and Thl 7 subsets and FOXP3+ regulatory T cells (21). In B 
cells, vitamin D is able to modulate cell proliferation, induce 
apoptosis and inhibit plasma cell differentiation and immuno- 
globulin secretion (22). 
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Figure 4. Genome wide plot of SE, AP, VDR and MS relations using Circos. From the outermost to the innermost, tracks shown are: AP-VDR enrichment/ 
SE-VDR enrichment/SE proportional coverage/AP proportional coverage/VDR proportional frequency/MS regions (as bars). Enrichment values are presented 
on a logarithmic scale. Enrichment and proportional coverage values are calculated and plotted within each cytoband. 



Table 6. Three-way analysis of overlap between SE-VDR-MS and AP-VDR-MS regions 



Proportional coverage (observed and As factors against pure expectation: 

expected) P(T1)*P(T2)*P(T3) 

SE-VDR-MS AP-VDR-MS SE-VDR-MS AP-VDR-MS 



P(T1&T2&T3) 


9.69E- 


-06 


1.18E- 


-05 


87.8 


235.6 


P(T1&T2)*P(T3) 


3.07E- 


-06 


2.30E- 


-06 


27.8 


46.0 


P(T1&T2)*P(T3) (fixed cytoband preference) 


6.33E- 


-06 


5.90E- 


-06 


57.4 


117.9 


P(T1&T3)*P(T2) 


3.47E- 


-07 


1.43E- 


-07 


3.1 


2.8 


P(T1&T3)*P(T2) (fixed cytoband preference) 


1.14E- 


-06 


4.20E- 


-07 


10.4 


8.4 


P(T2&T3)*P(T1) 


4.22E- 


-07 


1.91E- 


-07 


3.8 


3.8 


P(T2&T3)*P(T1) (fixed cytoband preference) 


1.13E- 


-06 


4.90E- 


-07 


10.2 


9.8 


P(T1)*P(T2)*P(T3) (fixed cytoband preference) 


7.09E- 


-07 


2.83E- 


-07 


6.4 


5.7 


P(T1)*P(T2)'P(T3) 


1.10E- 


-07 


5.01E- 


-08 


1.0 


1.0 



T 1 = SE/AP regions, T2 = VDR regions, T3 = MS regions. 



Secondly, the fact that VDR-SE and VDR-AP overlaps are 
seen not only at the genome-wide level, but also inside 
genomic regions associated with the MS risk provides a poten- 
tial biological explanation for the epidemiological observa- 
tions linking vitamin D deficiency to both risk and clinical 
course of MS (15,16,23,24). 

Finally, these data provide insights into the cell types that 
are primarily driving the onset of MS. It would be logical to 



hypothesize that genetic and environmental risk factors 
would have a direct influence on the cell type triggering the 
abnormal immune response in MS patients. Therefore, the 
observation that both genomic regions associated with MS- 
and VDR-binding intervals overlap with genomic regions 
which are active and play a regulatory role in B cell lines 
provide further support for an important role for this cell type 
in the pathogenesis of this devastating disease (25). The 
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Figure 6. Average position + one standard deviation of VDR, SE, AP elements and SE-VDR and AP-VDR overlaps within MS regions (proportional coverage 
on the >»-axis; position within MS regions on the x-axis). 
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presence of oligoclonal bands in the cerebrospinal fluid repre- 
sents the most consistent immunological finding in MS patients 
and B cell abnormalities influence both conversion to clinical- 
ly definite MS, MRI activity, onset of relapses and disease pro- 
gression (26-32). Furthermore, clinical trials have shown that 
antibody-mediated depletion of CD20+ B cells is highly 
effective in diminishing MRI activity and onset of clinical 
relapses (33,34). Unfortunately, a similarly detailed chromatin 
profile of T cells is not yet available and we could not perform 
a direct comparison between B and T cells. 

One possible confounding factor is that both chromatin and 
VDR-binding data come from B cells immortalized via 
Epstein Barr virus (EBV). While we acknowledge that the 
EBV-mediated immortalization process could influence both 
chromatin profile and VDR binding, this could itself be inter- 
esting for MS since EBV infection is another environmental 
risk factor which is strongly implicated in the aetiology of 
this complex disease (35-37). It would therefore be particular- 
ly interesting to investigate how EBV infection can change the 
chromatin features of infected B cells, and whether this modi- 
fies their relation with MS-associated regions and VDR 
binding. 

To conclude, genomic regions with VDR binding in LCLs 
substantially overlap with promoter and enhancer elements 
that are active in LCLs. This is also true in MS-associated 
regions and is therefore relevant for MS aetiology. Further, 
similar analyses in other immunological cell types and func- 
tional studies will be required to fully understand how 
vitamin D can influence the chromatin landscape and gene 
expression outside and inside genomic regions which play a 
role in the aetiology of this devastating disease. 

MATERIALS AND METHODS 

Data acquisition 

Genetic variants associated with the MS risk were obtained 
from the recent GWAS performed by the International Mul- 
tiple Sclerosis Genetics Consortium (IMSGC) and the Well- 
come Trust Case Control Consortium 2 (WTCCC2) (10). 
Since VDR binding could regulate gene expression at 
several kilobase of distance, we decided to define MS 
regions as genomic intervals of 0.25 cM centred on the lead 
associated SNP + 100 kb each side. The chromatin profiles 
of B-lymphoblastoid cells (GM12878), hepatocellular carcin- 
oma cells (HepG2), normal lung fibroblasts (NHLF) and 
normal epidermal keratinocytes (NHEK) were obtained from 
the ENCODE project (11). Briefly, chromatin immunoprecipi- 
tation followed by massively parallel DNA sequencing 
(ChlP-seq) and expression data were used to identify different 
classes of chromatin states: AP, WP, PP, SE, WE, PR, H, I, 
ST, WT and Rep/CNV (11). Genomic regions with VDR 
binding and differentially expressed genes in LCLs before 
and after stimulation for 36 h with 0. 1 iw calcitriol were iden- 
tified as previously described (17). 

Assessing significance and enrichment of base pair overlap 

To assess statistical significance, we defined a null model for 
which the location of track A intervals varied randomly, while 



preserving the empirical segment and inter-segment length 
distribution of track A intervals. Track B intervals were 
fixed. The number of overlapping base pairs between the 
two tracks was calculated for the real data, as well as for 
50 000 Monte Carlo samples from the null model. The 
P-value was calculated in the usual way, i.e. as the proportion 
of Monte Carlo samples being equal to or more extreme than 
the observed overlap. The analyses were performed on both 
the global (whole genome) and chromosome arms scale. 
For the local analysis of chromosome arms, P-values were 
deemed significant at a false discovery rate of 10% (38). 

The enrichment of track A intervals inside track B was 
defined as the ratio of the proportion of track B intervals 
covered by track A intervals, to the proportion of non-track 
B intervals covered by track A intervals. When assessing 
shared preference of bins versus detailed base pair overlap 
and three-way relations, a factor of observed divided by 
expected base pair overlap was used. 

Comparing overlap between VDR regions and chromatin 
states of LCLs versus control cell lines 

To evaluate the overlap of VDR regions and chromatin states 
in LCLs relative to that in other cell lines, we created case- 
control tracks for each chromatin state by removing all parts 
of chromatin intervals that overlapped between LCLs and 
control cells and marking the remaining intervals as case 
(LCL specific intervals) and control (other cell type specific 
intervals). P-values were computed by a Monte Carlo proced- 
ure, in which the case-control labels of chromatin intervals 
were randomly permuted. The observed base pair overlap 
between case intervals and VDR regions were compared 
against the corresponding distribution for 50 000 Monte 
Carlo samples in the usual way. The fold enrichment differ- 
ence in overlap between LCLs and control cells was calculated 
as the ratio between the proportions of case and control inter- 
vals that overlapped with VDR regions. 

Taking into account genes as a confounding track 

To control for a possibly confounding effect of gene proximity 
to the VDR-MS relation, we assessed VDR-MS overlap condi- 
tioning on their common relation to genes, using the approach 
to handle confounder tracks previously described (18). Briefly, 
we estimated the occurrence frequency of observed VDR 
binding in exponentially increasing ranges of distance to 
nearest gene, and under the null model sampled VDR 
binding according to a non-homogeneous Poisson process 
with intensity given by the estimated occurrence frequency 
at a given distance (range) from the nearest gene. MS 
regions were kept at their observed locations. 

Assessing VDR, SE and AP position relative to genes 
and MS regions 

To compare the distribution of VDR, SE and AP regions rela- 
tive to gene position, we divided the genome into regions 
around each Ensembl gene in a way that these extended 
Ensembl gene regions exactly covered the whole genome. 
Overlapping genes were clustered and each nucleotide of the 
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genome assigned to the closest gene giving as many 
gene-associated regions as there are Ensembl genes (after clus- 
tering). Each gene-associated region was then divided into 
three sub-regions: the gene in the middle, upstream regions 
and downstream region at the sides. Gene, upstream and 
downstream regions were further divided into 20 equally 
sized intervals and average coverage of VDR, SE and AP ele- 
ments in each of these intervals was computed. To compare 
the distribution of VDR, SE and AP elements within MS 
regions, we symmetrically extended the central interval of 
0.25 cM centred on the lead associated SNP + 100 kb each 
side, divided each MS region in 20 equally sized intervals 
and computed average VDR, SE and AP coverage. 
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